A unified approach for mining outliers

نویسندگان

  • Edwin M. Knorr
  • Raymond T. Ng
چکیده

This paper deals with nding outliers (exceptions) in large datasets. The identiication of outliers can often lead to the discovery of truly unexpected knowledge in areas such as electronic commerce, credit card fraud, and even the analysis of performance statistics of professional athletes. One contribution of this paper is to show how our proposed, intuitive notion of outliers can unify or generalize many of the existing notions of outliers provided by discordancy tests for standard statistical distributions. Thus, when mining large datasets containing many attributes, a uniied approach can replace many statistical discordancy tests, regardless of any knowledge about the underlying distribution of the attributes. A second contribution of this paper is the development of an algorithm to nd all outliers in a dataset. An important advantage of this algorithm is that its time complexity is linear with respect to the number of objects in the dataset. We include preliminary performance results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Local multivariate outliers as geochemical anomaly halos indicators, a case study: Hamich area, Southern Khorasan, Iran

Anomaly recognition has always been a prominent subject in preliminary geochemical explorations. Among the regional geochemical data processing, there are a range of statistical and data mining techniques as well as different mapping methods, which serve as presentations of the outputs. The outlier’s values are of interest in the investigations where data are gathered under controlled condition...

متن کامل

A Unified Notion of Outliers: Properties and Computation

As said in signal processing, “One person’s noise is another person’s signal.” For many applications, such as the exploration of satellite or medical images, and the monitoring of criminal activities in electronic commerce, identifying exceptions can often lead to the discovery of truly unexpected knowledge. In this paper, we study an intuitive notion of outliers. A key contribution of this pap...

متن کامل

Testing the Exactitude of Estimation Methods in the Presence of Outliers: An accounting for Robust Kriging

Estimation of gold reserves and resources has been of interest to mining engineers and geologists for ages. The existence of outlier values shows the economic part of the deposits subject to the fact that don’t depend on the human or technical errors. The presence of these high values causes a pseudo dramatically increment in variance estimation of economical blocks when applying conventional m...

متن کامل

Outlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis

Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...

متن کامل

A Unified Approach for Design of Lp Polynomial Algorithms

By summarizing Khachiyan's algorithm and Karmarkar's algorithm forlinear program (LP) a unified methodology for the design of polynomial-time algorithms for LP is presented in this paper. A key concept is the so-called extended binary search (EBS) algorithm introduced by the author. It is used as a unified model to analyze the complexities of the existing modem LP algorithms and possibly, help ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997